COS 511 : Theoretical Machine Learning

نویسنده

  • Soner Sevinc
چکیده

1 Review of the Bayes Algorithm Last time we talked about the Bayes algorithm, in which we give priors π i to each expert. The algorithm maintains weights w t,i for each expert. The π i values serve as the initial weights for the experts. Experts predict the distributions p t,i over the same set X, and the algorithm predicts the q t distribution as a mixture of those distributions. To restate it mathematically: N experts π i = prior, π i ≥ 0, i π i = 1 w 1,i = π i for t = 1,. .. T expert i predicts p t,i (distribution over X) master predicts q t q t (x) = N i=1 w t,i p t,i (x) observe x t normalization We showed: − t ln q t (x t) ≤ min i − t ln p t,i − ln π i. We derived the algorithm and its analysis by pretending the data was generated by a random process in which (1) Pr[i * = i] = π i (2) Pr[x t |x t−1 1 , i * = i] = p i (x t |x t−1 1) = p t,i (x t). We then defined: q t (x t) = q(x t |x t−1 1) = P r[x t |x t−1 1 ]. In the last part, we have shown the upper bound for the loss of the algorithm by pretending that everything is random. Also we note that the algorithm's loss is bounded with respect to the best expert. Now, we will look at the case where the error is not with respect to a single best expert, but a switching sequence of a subset of the experts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Theoretical Machine Learning Cos 511 Lecture #9

In this lecture we consider a fundamental property of learning theory: it is amenable to boosting. Roughly speaking, boosting refers to the process of taking a set of rough “rules of thumb” and combining them into a more accurate predictor. Consider for example the problem of Optical Character Recognition (OCR) in its simplest form: given a set of bitmap images depicting hand-written postal-cod...

متن کامل

COS 511 : Theoretical Machine Learning

In other words, if ≤ 1/8 and δ ≤ 1/8, then PAC learning is not possible with fewer than d/2 examples. The outline of the proof is: To prove that there exists a concept c ∈ C and a distribution D, we are going to construct a fixed distribution D, but we do not know the exact target concept c used. Instead, we will choose c at random. If we get an expected probability of error over c, then there ...

متن کامل

COS 511 : Theoretical Machine Learning

Suppose we are given examples x1, x2 . . . , xm drawn from a probability distribution D over some discrete space X. In the end, our goal is to estimate D by finding a model which fits the data, but is not too complex. As a first step, we need to be able to measure the quality of our model. This is where we introduce the notion of maximum likelihood. To motivate this notion suppose D is distribu...

متن کامل

COS 511 : Theoretical Machine Learning

as the price relative which is how much a stock goes up or down in a single day. St denotes the amount of wealth we have at the start of day t and we assume S1 = 1. We denote wt(i) to be the fraction of our wealth that we have in stock i at the beginning of day t which can be viewed as a probability distribution as ∀i, wt(i) ≥ 0 and ∑ iwt(i) = 1. We can then derive the total wealth in stock i a...

متن کامل

COS 511 : Theoretical Machine Learning

Last class, we discussed an analogue for Occam’s Razor for infinite hypothesis spaces that, in conjunction with VC-dimension, reduced the problem of finding a good PAClearning algorithm to the problem of computing the VC-dimension of a given hypothesis space. Recall that VC-dimesion is defined using the notion of a shattered set, i.e. a subset S of the domain such that ΠH(S) = 2 |S|. In this le...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008